Cardiff / Caerdydd
CF11 9LJ Nick Ramsay AM Tel / Ffôn: 029 2032 0500
Chair of the Public Accounts Committee Fax / Ffacs: 029 2032 0600
National Assembly for Wales Textphone / Ffôn testun: 029 2032 0660 Cardiff Bay Cardiff CF99 1NA info@audit.wales / post@archwilio.cymru www.audit.wales / www.archwilio.cymru
Reference: HVT/2852/caf
Date issued: 8 June 2018
Dear Nick
Given that the Committee is still in the process of finalising its inquiry on NHS informatics services, I thought I should bring to your urgent attention some information my staff have picked up in relation to the resilience of major NHS IT systems in Wales. We have recently learnt that a number of NWIS maintained systems have suffered “major incident” failures over the last 12 months. The situation is helpfully summed up in a paper that was recently presented to the Velindre NHS Trust board, and which I have enclosed.
The incidents of 24 January 2018 and 21 March 2018 appear to have caused particular concern given they affected a number of key clinical systems resulting in significant local problems for NHS bodies in respect of maintaining routine clinical activity while the systems were down. I am aware that Abertawe Bro Morgannwg University Health Board has raised concerns in respect of these two incidents and sought assurances from the Chief Executive of Velindre NHS Trust, although we are not sighted of any response as yet. We are not aware of patients coming to harm as a result of the system failures, although it can reasonably be expected that the issues will have contributed to negative patient experiences and significant frustration on the part of NHS staff. Indeed, as I write this letter I can see that news on the IT problems is already starting to filter through to the media. I understand that NWIS has taken immediate action to investigate and address each of the system failures, however, apparent delays in the service receiving formal investigation reports from NWIS mean that it is not possible to pinpoint the cause of the problems. Anecdotal evidence suggests that the system failures are the result of various different factors possibly relating to underlying infrastructure issues.
Page 1 of 2 - Incidents affecting availability of national clinical information systems - please contact us in Welsh or English / cysylltwch â ni’n Gymraeg neu’n Saesneg.
I must stress that these issues have come to light since I reported my findings on NHS informatics systems earlier this year, and it is important to note that this work did not examine issues relating to the resilience of systems or business continuity arrangements. As such I am not able to draw on any specific audit work in this area. However, from the information we do have it has struck me that there could be some read across to the wider systemic challenges identified in my recent report and the Committee’s subsequent inquiry. As such I thought it was appropriate that I draw the Committee’s attention to these issues so that it can consider whether or not to extend its inquiry in some way to include these significant concerns.
Yours sincerely
Huw Vaughan Thomas
Auditor General for Wales
Enc: Briefing on CANISC Major Incidents and Business Continuity
Page 2 of 2 - Incidents affecting availability of national clinical information systems - Please contact us in Welsh or English / Cysylltwch â ni’n Gymraeg neu’n Saesneg.
TRUST BOARD
BRIEFING ON CANISC MAJOR INCIDENTS AND BUSINESS CONTINUITY
Meeting Date: |
30th May 2018 |
Authors: |
Ann Marie Stockdale and Lisa Miller |
Sponsoring Director: |
Andrea Hague and Mark Osland
|
Report Presented by: |
Andrea Hague
|
Committee/Group who have received or considered this paper: |
None |
Trust Resolution to: (please tick) |
|||||||
APPROVE: |
|
REVIEW: |
|
INFORM: |
|
ASSURE: |
|
Recommendation: |
The Trust Board are asked to note the content of this paper and request that the Executive Management Board consider the options in more detail leading to a final option being taken forward as a matter of urgency.
|
This report supports the following Trust objectives as set out in the Integrated Medium Term Plan: (please tick) |
|
Equitable and timely services |
|
Providing evidence based care and research which is clinically effective |
|
Supporting our staff to excel |
|
Safe and reliable services |
|
First class patient /donor experience |
|
Spending every pound well |
|
ACRONYMS |
|
|
This report supports the following Health & Care Standards: http://www.wales.nhs.uk/governance-emanual/health-and-care-standards |
Staying Healthy Safe Care Effective Care Dignified Care Timely Care Individual Care Staff and Resources
|
1. Introduction / Background
1.1 This paper has been produced to brief the Trust Board regarding ongoing major incidents with the Canisc system within VCC.
1.2 There are a number of IT systems and supporting infrastructure in use at Velindre Cancer Centre that provides functions critical for the safe delivery of patient care. cord and administration system which was developed over 20 years ago.
1.3 This paper focuses specifically on incidents categorised as major at a local and/or national level. ServicePoint is the tool in place for the recording of such incidents for resolution and Datix incident reports are completed in line with Trust Policy.
1.4 These National clinical and administration critical systems are hosted by the NHS Wales Informatics Service (NWIS) out of the Newport and Blaenavon Data Centres.
1.5 Major incidents affecting the availability of national clinical critical systems including, but not limited to, Canisc and the Welsh Laboratory Information Management System (WLIMS) have reached increased in frequency with a total of 11 incidents occurring since 17 April 2018, summarised below along with the NWIS definitions:-
Date |
System |
Timescale |
17 April 2018 |
National Canisc Incident |
12:23 15:39 |
24 April 2018 |
National Data Centre Incident (Blaenavon) |
20:30 to 14:00 25 April 2018 |
30 April 2018 |
National Canisc Incident |
13:23 13:46 |
30 April 2018 |
National Welsh Laboratory Information Management System (WLIMS) |
15:18 to 17:30 |
2 May 8 May 2018 |
National Canisc Incident |
22:00 to 14:45 |
14 May 2018 |
National Canisc Incident |
09:10 19:00 |
14 May 2018 |
National Welsh Laboratory Information Management System (WLIMS) |
11:58 - 18:11 |
15 May 2018 |
National Welsh Laboratory Information Management System (WLIMS) |
14:45 18:22 |
16 May 2018 |
National Canisc Incident |
07:50 11:30 |
17 May 2018 |
National Canisc Incident |
08:30 20:01 |
22 May 2018 |
National Canisc Incident |
08:00 13:40 |
Definition of Major Incidents (NWIS)
Calls - Priority 1 (P1) |
P1s are one or more Incidents which cause significant business impact, resulting in the loss of a critical clinical or administrative service within one or more core sites. These would typically be incidents which: Cause unavailability of the Service, a key module or a major function; OR Cause incorrect processing of data or errors in a major Software function; AND affect multiple Authority Party Sites |
Calls Priority 2 (P2) |
P2s are one or more Incidents which cause localised business impact resulting in the loss of normal clinical or administrative service, to one site or multiple non-core sites. These would typically be incidents which: Cause disruption to a number of Users or unavailability to a single user within a core site of the Service, a key module or a major function; OR Cause incorrect processing of data or errors in a Software function; |
|
AND Affect some users at multiple Service Recipients or all users at a single site. |
1.6 The summary above reflects the incident resolution timescale only, not the recovery timescale or associated cost specific to the Cancer Centre.
1.7 At the time of providing this report, initial NWIS feedback has referred to the root cause being underlying infrastructure issues for the majority of incidents, however a formal response has not been received.
1.8 The Trust IG&T Committee has highlighted concerns to the NWIS Directors regarding the delay in receiving formal incident investigation reports. Please see appendix 1 for detail.
1.9 However, in terms of impact, this has resulted in:-
Lack of information to prescribe chemotherapy/radiotherapy
Delays/risk of error in prescribing chemotherapy/radiotherapy
Inability to access blood results, essential to support decision making in relation to patient treatment/care
Decisions made without all information to hand, requirement for retrospective checks to be made, leading to an increased likelihood of error Poor patient experience of time)
Potential impact on clinical staff attrition/retirement
Added stress to overburdened work force
Negative experience and could impact staff retention where it is hard to recruit (in light of medical staff shortages etc)
Consultants preparing every clinic (often in personal time at home) in the expectation IT systems cannot be accessed stress levels etc
1.10 All incidents were immediately reported to the NHS Wales Informatics Service, who then commenced their internal investigation in line with their service management processes.
1.11 Velindre Cancer Centre implemented their business continuity plan, including emergency communication key staff on a regular basis.
2. Timing:
2.1. This paper will advise Trust Board members of the current position. The Executive Management Board will discuss the options in more detail when the financial and implementation times are known. The EMB will then monitor the progress and provide regular reports to the Trust Board.
2.2. The current business continuity plan cannot be sustained and more importantly does not provide a full solution to an incidents. Therefore the urgent action is required.
3. Description:
3.1. The Trust IG&T Committee received a report on major incidents and discussions took place relating to recent IT systems issues during week commencing 14th May 2018. The Committee discussed the risks to the service, impact upon patient experience and staff wellbeing. The Chair also reported that this issue was raised many times during her induction walkaround at VCC.
3.2. VCC have business continuity plans in place which are tested regularly in real time due to the frequency of incidents. These area combination of providing and printing the paper record and access to some other systems such as WPAS.
3.3. Whilst the implementation of Welsh Clinical Portal, Chemocare and LIMS have enhanced continuity arrangements the reliance upon CANISC as a Patient Administration System (PAS) remains a risk as these systems do not replace the full functionality.
3.4. It is accepted that there is a programme to replace CANISC but at this stage there are no definitive timescales for this work to be completed.
3.5. VCC SMT now require more robust and enhanced continuity arrangements are implemented as a matter of urgency. This will mean reprioritisation of work and the understanding that some work will need to cease to allow the relevant departments to focus upon this work.
An assessment of the options available are stated below. However, at this stage the timescales and financial implications for option 3 and 4 are unknown. The ADI and Head of Information at VCC are progressing these with the relevant colleagues. Once these are known the full option appraisal will be provided to the EMB.
3.5.1. Option 1 Do nothing best endeavours on a case by case basis.
3.5.2. Option 2 Revert to recording all information in the physical medical record. It is recognised
inefficient, but will ensure that all Canisc case note information is available to all staff as required enabling safe care.
3.5.3. Option 3 Copy of the read-only Canisc database to be mirrored at Velindre Cancer Centre. This option would need to be undertaken with NWIS. In the event of a network incident at the Cancer Centre, the read only version of Canisc will become unavailable.
3.5.4. Option 4 Development of the Velindre Cancer Centre data warehouse to include data extracts specific to new patient and follow-up annotations.
3.6. Alongside the above options there are 3 key pieces of work required to support business continuity for the Cancer Centre, these are:-
3.6.1. Canisc Case Note Summary to be made available in the Welsh Care Record Service
3.6.2. Document Management System (VCC documents available in the Welsh Care Record Service); and
3.6.3. Welsh Clinical Portal link to the e-Master Patient Index
The Medical IT lead has identified the above as the minimum to enable safe care that must occur alongside the options.
4. Financial Impact:
4.1. Unknown at this stage.
5. Quality, Equality, Safety and Patient Experience Impact
5.1. There is no evidence to suggest that patients have been harmed but it is evident that the patient experience is poor when such incidents occur.
5.2. There is acknowledgement of a high profile (UK) case citing a computer failure requiring a doctor to obtain results over the telephone and a cognitive error being made. This led to the junior doctor essons learnt in relation to current Trust services and pressures.
6. Considerations for Board / Committee
6.1. Trust Board are asked to note the contents of this paper and task the EMB with taking forward the most appropriate option as a matter of urgency.
6.2. for both VCC and outreach clinics remains in place which requires the printing
of or saving of case notes in alternative format. This clearly has a significant resource implication and has led to some tasks or work ceasing in order to prioritise this service continuity plan.
7. Next Steps:
7.1. Trust Board are asked to consider this paper.
7.2. ADI and VCC Head of Information to finalise options with financial and timescale information.
7.3.
Chief Executive to ensure options are discussed at EMB and an
appropriate recommendation is resourced and achieved.
Appendix 1
Quarter 1 Update
During Quarter 1 2017, a total of six major incidents were reported. The status in relation to each incident is detailed below.
INC 01 |
LIMS National Incident 24 May 2017 |
Status: |
Incident resolved. Awaiting Major Incident Report from NWIS, due September 2017. |
INC 03 |
LIMS National Incident 20 June 2017 |
Status: |
Incident resolved. Awaiting Major Incident Report from NWIS, due October 2017. |
INC 04 |
Canisc Interface Feeds 19 June 2017 |
Status: |
Incident resolved. Awaiting Major Incident Report from NWIS, due October 2017. |
INC 06 |
Canisc National Incident 20 June 2017 |
Status: |
Incident resolved. Awaiting Major Incident Report from NWIS, due October 2017. |
INC 07 |
Velindre Cancer Centre Network 11 July 2017 |
Status: |
Incident resolved. Root cause analysis completed (see February 2018 report) |
INC 08 |
Network Issues at Cardigan Leisure Centre (WBS Donation Clinic) |
Status: |
Incident resolved. No further incidents of a similar nature. Equipment to test local Wi-Fi resilience has been purchased and will be used when undertaking risk assessments of potential new venues for donation clinics. |
Quarter 2 Update
INC 01 |
Radiology Investigation Reports unavailable in Canisc 22 August 2017 |
Status: |
Incident re-opened. Incident re-opened January 2018. Due to issues with setting up the end to end test environment, limited resources and operational issues, NWIS have delayed the testing of the Radis/Canisc bug fix until 09.04.18. Root Cause Analysis in draft. |
INC 02 |
Unavailability of Network (National Incident) 15 September 2017, 09:30 |
Status: |
Incident resolved. Awaiting Major Incident Report from NWIS, due December |
|
2017. |
INC 03 |
Unavailability of Network (Local Incident) 15 September 2017, 014:30 |
Status: |
Incident resolved. Root cause identified at time of incident and de-escalated. Investigation completed and reported in DATIX. |
Quarter 3 Update
No incidents
Quarter 4 Update
INC 01 |
Unavailability of Network (National Incident) 24 January 2018, 11:45 |
Status |
Incident resolved. All Wales Major Incident declared following connection issues with both national data centres. A range of clinical and admin critical systems were unavailable for a period of approx. 4 hours. Systems affected included CANISC, WCP, WLIMS (Cardiff & Vale UHB), all Trust email services and WHTN phone lines. Whilst still operational, the WBS experienced performance issues with its core ePROGESA system, as well as some label and report printing issues. The loss of email meant donor registry-to-registry messaging (used for donor/patient matching) was unavailable for the period of the outage. The outage |
directly impacted on patient services in VCC e.g. radiotherapy patients delayed, unable to
access test results, unable to schedule and manage outpatient attendances. NWIS have confirmed the cause as an equipment failure in the Newport Data Centre following delivery
submitted to Welsh Government. Major Incident Report awaited. The Trust is awaiting further feedback on the root cause and future mitigation(s).
INC 02 |
Unable to Access Canisc (National Incident) 1 February 2018 |
Status |
Incident resolved. Connection problem identified when users attempting to launch Canisc via the national Blaenavon Data Centre. Major Incident Report awaited. |
INC 03 |
Users Unable to Access WBS Online Booking System (National Incident) 20 March 2018 @ 18:05 |
Status |
Incident resolved. A planned change to national firewalls by NWIS (ServicePoint ref: 57202) impacted on services using the outer Demilitarized Zone (DMZ), which includes the WBS online booking system. Users were unable to access the website for approx. 2hours. WBS / Velindre were not notified of the planned change, nor was the online booking system |
|
ongoing with NWIS to understand root cause and ensure WBS / Velindre services are appropriately referenced within NWIS Service Catalogue. |
INC 04 |
National System Outage 21 March 2018 @ approx. 16:00 |
Status |
Incident resolved. Network issue in Blaenavon Data Centre (BDS) resulted in failure of a number of clinic and admin critical systems for a period of approx. 2 hours. Trust email services affected. In VCC access to XXXX was restricted, which resulted in XXXX. Full connectivity was restored by NWIS shortly after 18:30. The NHS Wales Informatics Service |
further feedback on the root cause and future mitigation(s).
INC 05 National Loss of Access to WLIMS 29 March 2018 15:56 17:10
Status: Incident resolved. All Wales incident preventing users from accessing LIMS. Messages not flowing users unable to send test requests and receive test results. Root cause identified as change to increase capacity on the Lab Database impacted the Citrix servers.